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Abstract 

Background: The rise of systems biology and availability of highly curated gene and molecular information 
resources has promoted a comprehensive approach to study disease as the cumulative deleterious function of a 
collection of individual genes and networks of molecules acting in concert. These "human disease networks" (HDN) 
have revealed novel candidate genes and pharmaceutical targets for many diseases and identified fundamental 
HDN features conserved across diseases. A network-based analysis is particularly vital for a study on polygenic 
diseases where many interactions between molecules should be simultaneously examined and elucidated. We 
employ a new knowledge driven HDN gene and molecular database systems approach to analyze Inflammatory 
Bowel Disease (IBD), whose pathogenesis remains largely unknown. 

Methods and Results: Based on drug indications for IBD, we determined sibling diseases of mild and severe 
states of IBD. Approximately 1,000 genes associated with the sibling diseases were retrieved from four databases. 
After ranking the genes by the frequency of records in the databases, we obtained 250 and 253 genes highly 
associated with the mild and severe IBD states, respectively. We then calculated functional similarities of these 
genes with known drug targets and examined and presented their interactions as PPI networks. 

Conclusions: The results demonstrate that this knowledge-based systems approach, predicated on functionally 
similar genes important to sibling diseases is an effective method to identify important components of the IBD 
human disease network. Our approach elucidates a previously unknown biological distinction between mild and 
severe IBD states. 

Keywords: Inflammatory bowel disease (IBD), Disease related genes, Protein-protein interaction networks, GO 
based functional score, Interpretation of pathogenesis 



Background 

Inflammatory Bowel Disease (IBD) is a chronic disease of 
unknown etiology that causes inflammation and ulcer in 
intestinal mucosa. Although IBD is still much less prevalent 
in Japan than in Western countries, the number of Japanese 
IBD patients has rapidly increased in the last 20 years [1]. 
This rising trend, also observed in the Asia-Pacific region 
[2,3] indicates that IBD is rapidly becoming a world-wide 
disease. There are two major sub-categories of IBD: 
Crohn's disease (CD) and ulcerative colitis (UC) [4]. 
Although the pathogenesis of IBD is not fully explained, 
genetic factors are suggested to contribute to dysregulation 
of intestinal immunity, leading to gastrointestinal injury. 
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A genetic study of IBD was first reported in 1988 as an 
epidemiological study of CD patients [5]. Genome-wide 
scanning (GWS) studies have revealed nine IBD suscept- 
ibility loci (IBD1-9) [6] and one susceptibility gene (NOD2) 
[7]. Genome- Wide Association Studies (GWAS) and corre- 
sponding meta-analyses identified 71 susceptibility loci for 
CD [8] and 47 loci for UC [9]. Recently the genetic sus- 
ceptibility to IBD was comprehensively reviewed in [10]. 
Another study based on and analysis of molecular path- 
ways suggested a significant overlap between IBD and 
autoimmune disorder, type 1 diabetes, ankylosing spondyli- 
tis (AS), multiple sclerosis, asthma [11], and rheumatoid 
arthritis (RA) [12]. 

IBD medical treatment policy and drug selection is 
determined according to IBD 'severity' (mild, moderate, 
or severe) [4,13-15], which reflects the frequency of rectal 
bleedings and stools as well as mucosal appearances on 
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sigmoidoscopy. Aminosalicylate (e.g., Mesalazine) is a pri- 
mary drug for the mild state, and anti-TNF antibody (e.g., 
Infliximab) is a primary drug for the severe state. Both 
drugs act on inhibition of inflammatory cascades; how- 
ever, while the anti-TNF antibody specifically interacts 
with TNF, aminosalicylate is similar to other members of 
nonsteroidal anti-inflammatory drugs (NSAIDs) having 
several molecular targets. 

Recently, human diseases have been studied with a sys- 
tems approach to address the need to understand the 
network of genes and molecules acting in concert to pro- 
duce the pathogenesis and progression of the disease. A 
human disease network (HDN) is a representation of the 
etiology of the disease by protein-protein interaction 
(PPI) data and information. Goh et al. constructed a 
comprehensive HDN to cover all diseases and showed 
that essential human genes encoded hub proteins in the 
network [16]. Hase et al. investigated physical properties 
of the comprehensive HDN, and elucidated that there 
were extensive interconnections among middle-degree 
nodes that formed the backbone of the network [17]. 
Hase et al. also investigated drug-target genes in the 
HDN, and found significant preference of drug targets 
for middle- and low-degree nodes [17]. 

Wall et al. studied the HDN for autism. They con- 
structed the network by genes related to autism and other 
neurological disorders in order to elucidate common fea- 
tures in the diseases [18]. They identified 154 genes not 
previously linked to autism. Based on this study, Wall 
et al. developed a web tool for comprehensive genetic 
annotation of diseases called "Genotator" [19]. Genotator 
provides an up-to-date and comprehensive collection of 
disease genes and a reliable gene-to-disease ranking for 
any disease. They showed that the integration of all the 
available databases gave a more complete picture than any 
one database alone, using Alzheimer disease as a case 
study. 

The idea of Genotator inspired us to investigate IBD by 
a knowledge driven HDN gene and molecular database 
systems approach. Like autism and Alzheimer disease, IBD 
is known to be a complex disease with a large number of 
genes and molecules implicated in the etiology of the dis- 
ease but without direct experimental evidence. A primary 
difference between our approach and Wall's is our use of 
available drugs, their targeted pathways and genes and 
related drug treatment information to highlight the more 
important genes to the disease (in this case IBD). We 
assume that current effective drug treatments and their 
targets provide essential information that will help identify 
key pathways important in the pathogenesis and progres- 
sion of the disease. We collected genes related to IBD and 
its drugs, constructed a disease network with the genes, 
and investigated the functional similarity of drug targets to 
the putative IBD genes. The human disease network 



results demonstrate a new approach to characterizing IBD 
and its progression from early stage into the chronic and 
more malignant state. 

Methods 

Drug information on IBD and sibling diseases 

We use the World Gastroenterology Organization [4] 
guidelines and IBD societies of USA [20], Europe [21-24], 
and Asia [2,3] to identify drugs for IBD and sibling dis- 
eases. Additional drug information was collected from 
PubMed and the Cochrane Library databases. The drug 
indications were investigated by databases: Package Inserts 
Database (in Japanese) [25] for Japanese drugs, FDA 
Approved Drug Products [26] for US drugs, and the elec- 
tronic Medicines Compendium (eMC) [27] for UK drugs. 
Thereafter, target molecules and pathways of a drug were 
retrieved from the DrugBank database [28]. We then 
selected a target molecule of all the drugs (compendium 
of drugs are found in Table 1). 

Collection of disease genes from public databases 

The basic gene association approach originated from 
Genotator [19] which collected evidence for the associa- 
tion of gene to disease from multiple databases and then 
scored the evidence thus providing an evidenced-based 
ranked list of genes associated with a given disease. Here, 
we collected IBD and sibling disease genes from HuGE 
Navigator, PharmGKB, GeneCards and Genetic Associa- 
tion Database (GAD). We downloaded the PharmGKB 
and GAD datasets and 'scraped' genes from the HuGE 
Navigator and GeneCards web applications. All data was 
selected in August, 2010. On the gene collection, we 
employed all the variation of synonymous names of IBD 
and sibling diseases. The collection of genes were then 
merged by matching EntrezGene IDs. If a database did 
not provide a gene's EntrezGene IDs, we used the 
DAVID bioinformatics tools [29,30] to retrieve the 
EntrezGene IDs. 

Protein-Protein Interaction network 

A Protein-Protein Interaction (PPI) network was produced 
by STRING8 (October, 2010) using at least one experi- 
mental association result, information from PPI databases, 
including additional edges of one path (depth = 2), and 
with high confidence (confidence > 0.9). We used CYTOS- 
CAPE [31] to visualize the PPI networks. 

Semantic similarity between two genes 

We employed the same scoring as [32]. First, define 

ic[c) = -io g (Kc)) (i) 

where p(c) is the frequency of annotation of the term 
c and its children in the GO graph. Here, Information 
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Table 1 A list of drugs for Inflammatory Bowel Disease and their indications 



IBD state 


Category 


Drug name 




Indication 








US 


UK Japan 


Mild 


Amino-salicylate 


Mesalazine 


UC, CD 


UC, CD UC, CD 






Salazosulfapyridine 


UC, CD, RA 


UC, CD, RA UC, CD, RA 






Olsalazine 


UC 


UC 






Balsalazide 


UC 


UC 


Moderate 


Immuno-modulators 


Azathioprine 


RA, GVHD 


RA, SLE, DM, AICAH, PM, PV, GVHD, CD 
PN, AHA ITP 






T;a(~rf"il i m i ic 
1 aLI Ul 1 1 1 IUj 


PTR 


PTR PTR, GVHD, MG, RA, LN, UC 




Corticosteroids 


Budesonide (Entocort only) 


CD 


CD 






Prednisolone 


RA, UC (chronic), Nephrotic Syndrome, Collagen Diseases, Fulminating SLE, Allergic 
Conditions, Bronchial Asthma, Acute Skin Diseases, Thrombocytopenia, Organ 
transplantation. 


Severe 


Anti-TNF antibody 


Infliximab 


UC, CD, RA, PS, AS 


UC, CD, RA, UC, CD, RA, 
PS, AS PS, AS, BD 






Adalimumab 


CD, RA, PS, AS 


CD, RA, PS, AS 






Certolizumab 


CD, RA 


RA 



Abbreviations: CD Crohn's disease, UC Ulcerative colitis, RA Rheumatoid artitis, PS Psoriasis (including PS Psoriatic arthritis), AS Ankylosing spondylitis, BD Bachet 
disease, GVHD Graft versus host disease, SLE Systemic lupus erythematosus, DM Dermatomyosis, PM Polymyositis, AICAH Auto-immune chronic active hepatitis, PV 
Pemphigus vulgaris, PN Polyarteritis nodosa, AHA Auto-immune haemolytic anaemia, ITP Idiopathic thrombocytopenic purpura, MG Myasthenia gravis, LN Lupus 
nephritis, PTR Platelet transfusion refractoriness 



Content (IC) is a measure of how informative a term c is 
relative to other terms [33]. Second, define the similarity 
between two GO terms c 2 and c 2 as: 



SIM term (ci,c 2 ) 



arg max (/C(c)) 

ceA(ci,c 2 ) 



(2) 



where A(c lt C2) is the set of common ancestors of Cj and 
c 2 . SIM term (c x,c 2 ) is defined as the most informative com- 
mon ancestor of Cj and c 2 [34,35]. Then, define the simi- 
larity between two GO term sets A and B {SIM termset (A, 
B)), assuming that the gene g A and the gene g B are anno- 
tated respectively with n and m GO terms as A = {G0 1A , 
G0 2A ,...,GO nA }, B = {G0 1B , G0 2B ,...,GO mB }. SIM terms JA,B) 
is defined as summation of the maximum similarities 
between a term in set A and any of terms in set B, normal- 
ized by summation of IC of the terms in set A: 



SIM, emt {A,B) ■ 



£ ar g max SIM len „ (GO, A , GO) 

~[ GOtB 

£ arg max Sm urm (GO iA , GO) + £ K^GO^) 

(.1 CO*B i-1 



(3) 



Please note that SIM termset (A, B) is not equal to SIM- 
termset(B, A). Then the expected similarity SSM between 



the g A and the g B is defined as: 

ccw , . SIM termset {A,B)+SIM 

termset 

SSM[g A ,g B ) = 



(4) 



(symbol for member of) 0[1]. A pseudo-code for SSM 
calculation is available in Additional file 1. We calculated 
the random distribution of the SSM score; 1,000 randomly 
collected human genes were tested against the drug targets 



for IBDmild and IBDsevere separately. Both distributions 
showed upper quartile threshold scores to be 0.5 (0.50760 
and 0.50940 for IBDmild and IBDsevere respectively), and 
we determined that the upper quartile was best suited to 
define both those genes most likely closely associated with 
the disease but least likely to be unimportant. Conse- 
quently, we used the 0.5 threshold as criteria for our SSM 
scoring. 

Gene score for record frequency 

Define Record Frequency RF g as the total frequency of 
records of a certain gene related to the sibling disease 
{sib) contained in the database (db): 



RF, 



E E bf&j) M 

jedbiesib N re f 

logl x — -J 

V N db N sib DF> 



(5) 



where N^b is the total number of tested databases (four 
in this study: HugeNavigator, PharmGKB, GeneCards, 
and GAD), N sib is the total number of the sibling dis- 
eases, N re t is the total number of papers reporting on any 
human gene, and DF g is the frequency of papers report- 
ing a certain gene. We calculated DF g and N re f by Entrez 
Programming Utilities provided by NCBI. When calculat- 
ing DF g , we replaced a name of a certain gene with its 
synonyms obtained from a complete gene information 
table ("Homo_sapiens.gene_info") provided by NCBI. N re f 
was evaluated by the number of papers annotated with 
MeSH terms of "homo sapiens" and "gene" or "protein". 
N ref ~23 million (Aug 10th, 2010). 
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Gene expression data and analysis 

We retrieved 12 data sets from GEO and 13 data sets from 
ArrayExpress by a query with keywords of "Crohn's dis- 
ease, CD, ulcerative colitis, UC, inflammatory bowel dis- 
ease, IBD, homo sapiens". After investigation of individual 
data sets, only three of them (GSE6731 [36], GSE9452 
[37], and E-TABM-118 [38]) consist of patients with CD 
and UC and include patient information on medicinal 
drugs the patients took. Consistent with our fundamental 
approach, we classified the patients prescribed aminosali- 
cylate into "patients in mild state" (IBDmild) and the 
patients prescribed anti-TNF antibody into "patients in 
severe state" (IBDsevere). Since E-TABM-118 includes 
very few patients in the severe state, we focused on differ- 
ential expressed gene analysis with GSE6731 and GSE9452 
using the "samr" package in R [39] between IBDmild and 
IBDsevere patients and data sets. Genes of FDR < 0.05 
were selected (2,549 genes from GSE6731 and 17 genes 
from GSE9452) and mapped the genes to our PPI network 
produced by STRING8 resulting in 28 genes from 
GSE6731 (listed in Additional file 2: Table S4) and no 
genes from GSE 9452 mapped to the PPI network. 

Results 

Determination of sibling diseases for the mild and severe 
states of IBD 

As in autism spectrum disorder [18], we assumed genes 
common among the IBD network of sibling diseases 
would represent a molecular mechanism essential to all 
sibling diseases including IBD. The sibling diseases were 
determined as follows. 

The World Gastroenterology Organization defines three 
categories of malignancy in IBD, mild, moderate, and 
severe states in its global guidelines for IBD treatment [4] . 
There is also clear categorization in drug indication to IBD 
according to the malignancy: aminosalicylate drugs for 
mild state patients, immunomodulator drugs and corticos- 
teroid drugs for moderate state patients, and anti-TNF 
antibody drugs for severe state patients (Table 1). We then 
assumed the drugs provided the key to determine known 
sibling diseases for IBD. Table 1 shows diseases to which 
the same drugs as IBD are indicated in US, UK, and Japan. 
CD, UC, and RA are representative diseases treated with 
the drug of the mild state (aminosalicylate), while PS, AS, 
and BD are representative diseases treated using the drug 
of the severe state (anti-TNF antibody). Although the 
severe state also includes CD, UC, and RA, the three dis- 
eases were eliminated from the sibling group for the severe 
state (Table 1) to investigate the net difference between the 
mild and severe states. We therefore determined these dis- 
eases as the sibling diseases of IBD for its mild: IBDmild = 
{CD, UC, RA} and severe: IBDsevere = {PS, AS, BD} states 
(as used above). The moderate state was neglected from 



this study because we aimed at a comparison between the 
most extreme malignancy IBD states. 

Collection of disease genes from public databases 

Figure 1 shows the number of genes related to the two 
sibling diseases, IBDmild and IBDsevere. Genes for each 
sibling disease (1,264 for IBDmild and 869 for IBDsevere, 
Figure 1) were collected from the public databases: 
HugeNavigator, GeneCards, PharmGKB, and GAD. For 
each gene, g, we calculated the ranking score RFg by the 
frequency of the gene in databases and the frequency of 
the gene in PubMed according to equation (5). RFg was 
calculated for all genes associated with each sibling dis- 
ease. We then determined a threshold score for each sib- 
ling disease used to exclude genes without significant 
evidence supporting their association with the sibling dis- 
ease. To define the thresholds, we identified all genes in 
the two sibling disease gene sets which have verified asso- 
ciation with IBD by direct literature review. We then 
scored all verified genes and chose the gene with lowest 
score in each sibling gene set (TNF for IBDmild and CRP 
for IBDsevere). Applying these thresholds to the RFg 
ranked list of sibling genes resulted in 250 and 253 genes 
for IBDmild and IBDsevere respectively. The intersection 
of the two sibling disease sets was 94 genes. 

All genes in the IBDmild and IBDsevere gene sets were 
tested for Protein-Protein Interactions (PPIs) with 
STRING8 [40] and those with at least one published 
experimental result and information from the PPI data- 
bases support were included in subsequent HDN analysis. 
A total of 146 (IBDmild) and 151 (IBDsevere) genes had at 
least one validated PPI relationship (Figure 1). We define 
the collective PPI networks as the "HDN" for IBD. 

Significant genes in IBDmild and IBDsevere 

Our goal is to identify significant genes representing dif- 
ferences between IBDmild and IBDsevere. We focused 
on drugs that treat either IBDmild or IBDsevere. Amino- 
salylate was the drug specific to IBDmild, and anti-TNF 
antibody was the drug specific to IBDsevere (Table 1). 
In a cell, aminosalylate inhibits PTGS1, PTGS 2, ALOXS, 
and PPARG. These four genes were defined as drug tar- 
gets for IBDmild in this study. On the other hand, anti- 
TNF antibody inhibits TNF receptor 1A and IB. These 
two genes were defined as drug targets for IBDsevere. 
We consider a gene to be functionally similar to these 
drug target genes as a significant gene potentially repre- 
senting differences between IBDmild and IBDsevere. 
Such a functional similarity can be measured by a score 
provided by Gene Ontology (GO). There have been 
reports on application of the functional similarity score 
to the analysis of PPI networks [33-35]. We employed 
the GO-based similarity score (SSM in equation (4)) to 
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Step 1 ) The number of genes retrieved from each public database 



Disease name 


HuGE Navigator 


PharmGKB 


Gene Cards 


GAD 


Crohn's disease (CD) 


383 


89 


162 


162 


Ulcerative colitis (UC) 


232 


23 


796 


187 


Rheumatoid arthritis (RA) 


457 


151 


106 


177 


Psoriasis (PS) 


181 


52 


684 


130 


Ankylosing spondylitis (AS) 


86 


5 


215 


77 


Behcet disease (BD) 


107 


0 


95 


72 



Step 2) The sibling diseases and the number of genes for each state 



Disease state 


Sibling disease 


Total gene 


Threshold 


Selected genes by RFg score 


IBDmild 


CD, UC, RA 


1,264 


TNF 


250 


IBDsevere 


PS, AS, BD 


869 


CRP 


253 


n 


94 genes are common 



Step 3) The number of genes (nodes) in HDNs for IBD and selected genes by SSM score 



HDN 
IDBmild 


Total gene 
146 


Drug target 

prostaglandin G/H synthetase 1 and 2, 
arachidonate 5-lipoxygenase, PPARG 


Selected genes by SSM score 
46 


IDBsevere 


151 


TNFRSF1A, TNFSRF1B 


62 



Figure 1 The number of genes at each step of the analysis 

V J 



our HDNs to highlight functional similarity of a gene to 
the drug targets. 

Figure 2 shows the HDNs whose nodes were selected by 
the functional similarity of a gene to the drug targets. A 
node scored less than 0.5 (upper quartile of the random dis- 
tribution of SSM) were eliminated from the HDNs. A trian- 
gular node indicates a gene belongs to both HDNs of 
IBDmild and IBDsevere, while a circular node represents a 
gene belongs to either HDN of IBDmild or IBDsevere. 
Additional file 3: Table S2 lists the genes consist of the 
HDNs. The similarity score in this study showed a mean 
score of 0.40050, with a standard deviation of 0.13942, a 
maximum score of 0.86640, a minimum score of 0.02972, 
and a upper quartile of 0.50760 against IBDmild drug tar- 
gets, and a mean score of 0.37570, with a standard devia- 
tion of 0.17189, a maximum score of 0.91200, and a 
minimum score of 0.02417, and a upper quartile of 0.50940 
against IBDsevere drug targets. 

Differences between the HDNs of IBDmild and IBDsevere 

Table 2 shows the functional classification of genes in the 
HDNs of IBDmild and IBDsevere. Both HDNs consist of 



similar kinds of functional groups: i.e., inflammation, 
innate and acquired immune response, apoptosis, tumor- 
igenesis, and tissue remodeling. However, the HDN of 
IBDsevere includes genes of tumorigenesis and apoptosis 
larger in number than the HDN of IBDmild (encircled by 
a green area in Figure 2(A) and 2(B)). The HDN of IBD- 
mild also includes some genes of tumorigenesis and 
apoptosis, but the genes stay peripherally around a cen- 
tral gene group of inflammation and immunoregulation 
(encircled by a rose area). In contrast, the HDN of IBDse- 
vere includes close interconnection among genes from 
the tumorigenesis and apoptosis group. This feature was 
not observed in the HDN of IBDmild. 

Both in the HDNs of IBDmild and IBDsevere, some 
genes of inflammation and immunoregulation have highly 
condensed interactions with surrounding genes. Interest- 
ingly, numbers of them were detected by the GWAS stu- 
dies [8,9] (Additional file 4: Table S3). These genes (IL2, 
IL12B, IL23R, IFNG, and JAK2) indicate significant func- 
tionalities in the IBD pathogenesis. Besides, differentially 
expressed gene between IBDmild and IBDsevere are also 
included in the condensed interconnections of genes. 
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(A) 



tissue 

remodeling serpinei 



tumorigenesis 
and apoptosis 



PTPN11 JGFB1 TLR4 TLR<s) (TLRl) 



CD14 TLR2 




inflammation and 
immunoregulation 



CB) 



)^ MMP3 

plat! ^- — ' ' — 

tissue "^\A 
remodeling serpinei IRF5 

SELE i / 



tumorigenesis 
and apoptosis 

TNFRSF 




inflammation and 
immunoregulation 



Figure 2 Protein-protein interactions among the genes with high functional similarities to the drug targets. (A) A network composed of 
functionally similar genes to prostaglandin G/H synthetase I and 2, arachidonate 5-lipoxygenase, and PPARG, all of which are drug targets for 
IBDmild. (B) A network composed of functionally similar genes to TNF receptor I A and IB, which are drug targets for IBDsevere. In each network, a 
triangle node indicates a gene common to the HDNs of IBDmild and IBDsevere, a circle node indicates a gene specific either the HDN of IBDmild 
or IBDsevere. A node bordered with an orange line indicates a gene detected by GWAS. An edge connecting the GWAS genes is also 
highlighted in pale orange. A node marked with an arrow indicates a differentially expressed gene in GSE6731. A blue arrow indicates an up- 
regulated gene in IBMmild, while a red arrow indicates an up-regulated gene in IBMsevere. Both networks are divided into three portions by 
colored areas. The blue area indicates genes relate to tissue remodeling. The green area indicates genes relate to inflammation and 
immunoregulation. The rose area indicates genes relate to tumorigenesis and apoptosis. 
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Table 2 Functional classification of genes consist of HDNs for IBD 

HDN Functional category Gene 

IBDmild immunoregulation ICAM1, IL2, IL23A, IL23R, 114, CRP, IL11, 

IL2RA, IL4R, IRAK3, TYK2 

innate immunity CD14, TLR2, TLR4, TLRl, TLR6, IFNG, IL8 

inflammation and immunoregulation CCL2, IL12B, 1117, ILIA, IL1RN, NFKB1, 

NFKBIA, CCLI1, CCL26, ILIORA, ILI8RAP 

inflammation and apoptosis ILIB, IRF5, TNFSF1A, ILI5, JAK2, STAT6 

cell growth, apoptosis and tumorigenesis TP53, PTPN 1 1, STAT3, TGFB1 

atherosclerosis and tissue remodeling SERPINEI, TIMPI, A2M 

diabetes related PPARG 



prostaglandin biosynthesis PTG5I, PTG52 

IBDsevere immunoregulation ICAMI, IL2, IL23A, IL23R, IL4, B2M, CD28, CX3CLI, CTLA4, HLA-C, KLRC1, ILI2A, 

ILI2RB2, STAT2, STAT4 

innate immunity CD/4, TLR2, TLR4, IFNG, IL8 

inflammation and immunoregulation CCL2, ILI2B, ILI7, ILIA, ILIRN, NFKBI, 

NFKBIA, CCL5, CSF2, ILIR1, IL6, IL6R, ILI8, NR3C1 

inflammation and apoptosis ILIB, IRF5, TNFSF1A, IL15 

apoptosis FASLG, TNFAIP3, TNFRSFIB, TRADD 

cell growth and tumorigenesis TP53, IGF1, IGFBP3, TGFBi, TGFBRI, S0CS1 

aging and tumorigenesis SODI, S0D2 

atherosclerosis and tissue remodeling SERPINEI, TIMPI, EDNI, MMP3, PLAT, SELE 

diabetes related PPARG, LEP, PPARA 



We annotated the genes by their functions according to EntrezGene database. 

They are IFNG, IL4, IL17A, NFKBI, and TGFBI (indicated 
by either blue or red arrow in Figure 2), selected by 
SAM statistics (FDR < 0.05) on gene expression data of 
GSE6731. While the genes of inflammation and immunor- 
egulation were up-regulated in IBDmild {IFNG, IL4, and 
IL17A, indicated by blue arrows), the genes of tumorigen- 
esis and apoptosis were up-regulated in IBDsevere (NFKBI 
and TGFBI, indicated by red arrows). 

The molecular picture of the progression of IBD 

IBD is characterized by the progression from early stage 
into chronic and more malignant states. The characteristic 
is widely used in the clinical treatment of IBD, but its 
molecular processes remains unclear. This study provides 
a molecular picture for the progression of IBD. The mole- 
cular picture tells us a difference between the mild and 
severe states, two extreme states in the progression of 
IBD. The biological details are described in "Discussion". 

Discussion 

Our knowledge driven HDN gene and molecular data- 
base systems approach consists of the following steps: 1) 
Determination of sibling diseases for IBD based on drug 
information, 2) Collection of IBD and sibling disease 
genes from multiple databases, 3) Scoring the disease 
genes by evidence-based ranking weighted by "the fre- 
quency in databases" and "the frequency in PubMed", 4) 



Evaluation of the disease genes for Protein-Protein Inter- 
action relations, and 5) Investigation of GO-based func- 
tional similarity of drug targets to the putative IBD genes. 
We summarized the criteria for our selection of genes 
specific to IBDmild and IBDsevere in Table 3. Our results 
may lead to an elucidation for IBD pathogenesis that 
remains largely unknown. 

Sibling diseases, closely aligned to a complex disease 
such as IBD, provide a novel opportunity to use compre- 
hensive omics data to identify a core biochemical or treat- 
ment pathway not previously identified in medical biology. 
If correctly defined, a collection of sibling diseases can 
'cover' the entire pathophysiological process of a target 
disease more completely than any one disease. Wall et al. 
determined autism sibling diseases based on commonly 
involved genes [18]. We defined IBD sibling diseases based 
on drug indications. Drugs act on changing pathogenic 
states of a disease. Accordingly, it is highly likely that the 
sibling diseases share a common molecular mechanism. 
This approach may be generalizable if drugs acting on sib- 
ling disease states, are available across the developmental 
progression of a given central disease. 

Figure 2 indicates that the HDNs for IBD change 
according to the state of the disease progression. Our 
results show that JAK2 and STAT3, which are known to 
be associated with IBD [41], are specific to the HDN of 
IBDmild (Figure 2(A)) [10]. These genes form the central 
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Table 3 Criteria for selection of genes specific to IBDmild and IBDsevere 



Criteria for selection 


No. of IBDmild 
genes 


No. of IBDsevere 
genes 


(1) Collect IBD-associated genes from databases for human diseases. 


1,264 


869 


(2) Rank the genes by the frequency of the IBD-association in the databases (equation (5)). Cut off the genes 
falling short of the lowest frequency of a gene validated its disease association by literatures {TNF for IBDmild 
and CRP for IBDsevere). 


250 


253 


(3) Select the genes interconnected by PPI evidence. 


146 


151 


(4) Rank the genes by GO based similarity score (SSM, equation (4)) to IBD drug targets. Cut off the genes 
under the upper quartile of the random distribution of SSM scores against IBD drug targets. 


46 


62 



part of the HDN of IBDmild and have links to pro- 
inflammatory genes (ILIA/ IB, IL12B, IL17A, and IL23A/ 
R), which are common in both HDNs. These suggest that 
JAK2 and STAT3 are key factors in the early stage of IBD 
pathology. On the other hand, the HDN of IBDsevere 
(Figure 2(B)) indicates a relation of IBD to more malig- 
nant diseases like cancer. Patients with long-standing 
IBD have an increased risk of developing colorectal can- 
cer [42]. Such a transition in functional classes of genes 
was also observed with differential expressed genes 
between IBDmild and IBDsevere obtained by GSE6731 
(nodes with arrows in Figure 2). 

The HDNs of IBDmild and IBDsevere include 13 genes 
obtained by GWAS reported in [8,9]. The 13 genes are 
highlighted by colored borders in orange in Figure 2 and 
listed in boldfaced in Additional file 3: Table S2. Most of 
the genes are interconnected with each other except 
ICAM1 and IRF5. All interconnected genes belong to the 
early response of inflammation, which includes cytokines, 
chemokines, receptors, and cellular signaling molecules. 
The other two genes, ICAM1 and IRF5, belong to the late 
response of inflammation, i.e., enhancement of immune 
response. Our HDNs illustrate what molecules intervene 
between the two sibling disease pathways. Both in the 
HNDs of IBDmild and IBDsevere, TP53 and NFKB1I 
NFKBIA are common to both disease states, indicating 
that the transcriptional regulation intervenes between the 
early [IBDmild) and late {IBDsevere) responses. In this 
way, our HDN analysis and approach helps to clarify the 
molecular and therefore disease implications of GWAS 
candidate genes. 

A recent comprehensive review of molecular pathways 
for IBD pathogenesis [10] supports characteristic genes 
in our HDNs of IBDmild and IBDsevere. The genes 
characteristic of the HDN of IBDmild (Jak2, Stat3, and 
IL23) belong to Thl7-cell differentiation in [10], and the 
genes characteristic of the HDN of IBDsevere (IL12, 
IFN-gamma, IL18 and FASLG) belong to Thl-cell driven 
responses in [10,36]. The physiological balance between 
Thl and Thl7 may be deteriorated by environmental 
factors such as intestinal bacteria stress, which even- 
tually leads to autoimmune responses composing IBD 
("hygiene hypothesis" introduced by Strachan [43]). Our 



HDNs suggest a transition from Thl7 to Thl domi- 
nancy along with progression of malignancy [44]. Our 
approach stratifying the disease-related genes into IBD- 
mild and IBDsevere enables us to infer a clinically signif- 
icant transition of a state of a disease such as the Thl/ 
Thl7 transition. 

Conclusions 

In this study, we employed a knowledge driven human 
disease systems approach to analyze IBD, whose patho- 
genesis remains largely unknown. Based on drug indica- 
tions for IBD, we determined two sibling disease states of 
IBD (mild and severe). After ranking the genes by the fre- 
quency of the records, we obtained 250 and 253 genes for 
IBDmild and IBDsevere, respectively. We calculated func- 
tional similarities of these genes with IBD drug targets 
and drew their interactions as PPI networks we later 
defined as the two sibling disease HDNs. The HDNs 
revealed biological and clinical insights into the molecu- 
lar differences between IBDmild and IBDsevere. The 
results demonstrated that knowledge annotation of sib- 
ling disease HDNs with focus on high similarity genes is 
an effective approach to identify common genes and 
pathways important to the complex disease network. 

Additional material 



Additional file 1: Pseudo-code for SSM calculation 

Additional file 2: Table S4. Differential expressed genes between 
IBDmild and IBDsevere Resulted 28 genes from GSE6731: differentially 
expressed between IBDmild and IBDsevere (FDR < 0.05) and mapped to 
our PPI network produced by STRING8. 

Additional file 3: Table S2. Genes consisting of the HDNs Genes 
consisting of the HDNs of Figure 2 are listed in this Table. These genes 
were selected when being functionally similar to the drug targets of 
each state (SSM > 0.5), as well as when having PPI connections among 
themselves. Accordingly, the genes "common to both HDNs" are 
different between IBDmild and IBDsevere, because SSM scoring is different 
between the HDNs. Genes are boldfaced when they are susceptibility 
loci for IBD detected by GWAS [8,9]. Genes are underlined when they are 
selected as differential expressed genes by SAM statistics (FDR < 0.05) 
[39] with GSE6731. Abbreviations for gene names are lilsted in Additional 
file 5: Table S1. 

Additional file 5: Table SI. Abbreviations for gene names All the 

abbreviations for gene names used in this manuscript are listed. 
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Additional file 4: Table S3. Bond order of susceptibility gene for 
IBD detected by GWAS. GWAS susceptibility genes [8,9] are listed with 
their bond order in HDNs of IBDmild and IBDsevere. 
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